Girsanov Based Direct Policy Gradient Methods

نویسندگان

  • Evangelos A. Theodorou
  • Emo Todorov
چکیده

Despite the plethora of reinforcement learning algorithms in machine learning and control, the majority of the work in this area relies on discrete time formulations of stochastic dynamics. In this work we present a new policy gradient algorithm for reinforcement learning in continuous state action spaces and continuous time. The derivation is based on successive application of Girsanov’s theorem and the use of the Radon Nikodým derivative as formulated for markov diffusion processes. The resulting policy gradient is reward weighted with the reward taking the form of a path integral. We apply the resulting algorithm in two simple examples for learning attractor landscapes in rhythmic and discrete movements.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Absolute Continuity of Symmetric Markov Processes

We study Girsanov’s theorem in the context of symmetric Markov processes, extending earlier work of Fukushima-Takeda and Fitzsimmons on Girsanov transformations of “gradient type”. We investigate the most general Girsanov transformation leading to another symmetric Markov process. This investigation requires an extension of the forward-backward martingale method of Lyons-Zheng, to cover the cas...

متن کامل

Scaling Reinforcement Learning Paradigms for Motor Control

Reinforcement learning offers a general framework to explain reward related learning in artificial and biological motor control. However, cur-rent reinforcement learning methods rarely scale to high dimensional movement systems and mainly operate in discrete, low dimensional domains like game-playing, artificial toy problems, etc. This drawback makes them unsuitable for application to human or ...

متن کامل

Multi-Batch Experience Replay for Fast Convergence of Continuous Action Control

Policy gradient methods for direct policy optimization are widely considered to obtain optimal policies in continuous Markov decision process (MDP) environments. However, policy gradient methods require exponentially many samples as the dimension of the action space increases. Thus, off-policy learning with experience replay is proposed to enable the agent to learn by using samples of other pol...

متن کامل

Regularized Policy Gradients: Direct Variance Reduction in Policy Gradient Estimation

Policy gradient algorithms are widely used in reinforcement learning problems with continuous action spaces, which update the policy parameters along the steepest direction of the expected return. However, large variance of policy gradient estimation often causes instability of policy update. In this paper, we propose to suppress the variance of gradient estimation by directly employing the var...

متن کامل

Natural Policy Gradient Methods with Parameter-based Exploration for Control Tasks

In this paper, we propose an efficient algorithm for estimating the natural policy gradient using parameter-based exploration; this algorithm samples directly in the parameter space. Unlike previous methods based on natural gradients, our algorithm calculates the natural policy gradient using the inverse of the exact Fisher information matrix. The computational cost of this algorithm is equal t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012